Skip to content

Initialize with a list of ZoneIds. Fixes #180#182

Open
PhotoKevin wants to merge 4 commits intoRomanIakovlev:masterfrom
PhotoKevin:master
Open

Initialize with a list of ZoneIds. Fixes #180#182
PhotoKevin wants to merge 4 commits intoRomanIakovlev:masterfrom
PhotoKevin:master

Conversation

@PhotoKevin
Copy link
Contributor

commit cdab7ad
Merge: d025c9f ddcaf60
Author: kevin kevin@blackholeofphotography.com
Date: Sun Jan 4 19:59:49 2026 -0500

Merge branch 'InitializeByZoneNames' of https://github.com/PhotoKevin/timeshape into InitializeByZoneNames

commit d025c9f
Author: kevin kevin@blackholeofphotography.com
Date: Sun Jan 4 19:59:43 2026 -0500

Added an example of the new initializer method.

commit ddcaf60
Merge: c54ff64 ff837e7
Author: Kevin kevin@blackholeofphotography.com
Date: Sun Jan 4 19:48:28 2026 -0500

Merge branch 'RomanIakovlev:master' into InitializeByZoneNames

commit c54ff64
Author: kevin kevin@blackholeofphotography.com
Date: Sat Jan 3 13:34:06 2026 -0500

Parameter order on assertEquals was backwards

assertEquals parameters were reversed. Changed to expected, actual.

commit 180966d
Author: kevin kevin@blackholeofphotography.com
Date: Sat Jan 3 13:31:18 2026 -0500

New initialization functions that use a list of ZoneIds.

Added initialization functions to use a lists of ZoneIds. Users may find it easier to determine which time zones they're interested in instead what the bounding box is for, say, all of Europe.

Added documentation comments for parameters.

commit cdab7ad
Merge: d025c9f ddcaf60
Author: kevin <kevin@blackholeofphotography.com>
Date:   Sun Jan 4 19:59:49 2026 -0500

    Merge branch 'InitializeByZoneNames' of https://github.com/PhotoKevin/timeshape into InitializeByZoneNames

commit d025c9f
Author: kevin <kevin@blackholeofphotography.com>
Date:   Sun Jan 4 19:59:43 2026 -0500

    Added an example of the new initializer method.

commit ddcaf60
Merge: c54ff64 ff837e7
Author: Kevin <kevin@blackholeofphotography.com>
Date:   Sun Jan 4 19:48:28 2026 -0500

    Merge branch 'RomanIakovlev:master' into InitializeByZoneNames

commit c54ff64
Author: kevin <kevin@blackholeofphotography.com>
Date:   Sat Jan 3 13:34:06 2026 -0500

    Parameter order on assertEquals was backwards

    assertEquals parameters were reversed. Changed to expected, actual.

commit 180966d
Author: kevin <kevin@blackholeofphotography.com>
Date:   Sat Jan 3 13:31:18 2026 -0500

    New initialization functions that use a list of ZoneIds.

    Added initialization functions to use a lists of ZoneIds. Users may find it easier to determine which time zones they're interested in instead what the bounding box is for, say, all of Europe.

    Added documentation comments for parameters.
Comment on lines 211 to 213
ZoneId zoneId = ZoneId.of(zoneIdName);
getPolygons(f).forEach(polygon -> {
if (timeZones.contains (zoneId)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could move the timeZones.contains(zoneId) check outside of the getPolygons(f).forEach loop, because it doesn't depend on the polygon, only on the zoneId.

Suggested change
ZoneId zoneId = ZoneId.of(zoneIdName);
getPolygons(f).forEach(polygon -> {
if (timeZones.contains (zoneId)) {
ZoneId zoneId = ZoneId.of(zoneIdName);
if (timeZones.contains (zoneId)) {
getPolygons(f).forEach(polygon -> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It came from copying the original method.



@SuppressWarnings("SizeReplaceableByIsEmpty")
static Index build(Stream<Geojson.Feature> features, int size, List<ZoneId> timeZones, boolean accelerateGeometry) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Set works better here.

Suggested change
static Index build(Stream<Geojson.Feature> features, int size, List<ZoneId> timeZones, boolean accelerateGeometry) {
static Index build(Stream<Geojson.Feature> features, int size, Set<ZoneId> timeZones, boolean accelerateGeometry) {

* @return an initialized instance of {@link TimeZoneEngine}
*/

public static TimeZoneEngine initialize(List<ZoneId> timeZones,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method looks like mostly a duplicate of the existing initialize method, with the exception of added timeZones parameter.

If you look at other initialize overloads, they don't duplicate the initialization logic, but rather make calls the initialize(double minLat, double minLon, double maxLat, double maxLon, boolean accelerateGeometry, TarArchiveInputStream f) overloaded method with the default values for parameters they do not accept. Would it be possible to do the same for this new overload?

This approach requires providing the default values for the parameters, and for timeZones I can think of 3 options for default values:

  1. Set of all the time zones supported by JVM, can be done with https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#getAvailableZoneIds--. One downside is that it returns Set<String>, not Set<ZoneId>, so we'd have to fall back to strings equality check internally, but we should still expose ZoneId as the parameter type.
  2. Use empty set as default value and treat it specially in the Index.build method (if it's empty then make no checks on it, just include all the time zone from the data).
  3. Replace inclusion check with exclusion in Index.build, and use a complement of all the time zone ids as the default value. Advantage compared to option 2 is that there's no need to special-case the empty set in Index.build.

What do you think of these options? I personally lean towards the option 2, as it incurs the least amount of overhead during initialization, but I'm also open to 1 and 3, since I don't believe checking a set membership requires a lot of compute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It could be also done by converting the Set to Set before passing. More time, more memory, but tiny compared to the overall size of parsing the data files.

2 This seems best to me, although maybe as a null parameter instead of an empty set. I could go either way.

  1. I'm not sure of the overhead of checking a set membership, probably in the same realm as hoisting that if statement out. I'd have to look at the underlying implementation to be sure. I'd guess it's either a sorted array or a hash table since sets have to check on duplicates and that would hurt. But like option 1, it'd be lost in the noise of decompressing and reading data.tar.zstd.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's go for option 2 then, but I'd prefer to avoid nulls and rather use Optional<Set<ZoneId>>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I implemented this, I came to the conclusion that I've missed your point. Why have the time zones be optional? If someone doesn't want to specify a set of zones and instead take in the entire planet then they can call

public static TimeZoneEngine initialize()

or

public static TimeZoneEngine initialize(boolean accelerateGeometry)

Unless you mean to make an initialize that takes both a bounding box and a set of time zones and then decides which to use depending if the time zone set is present, or not.

The 'real' methods are:

    public static TimeZoneEngine initialize(double minLat,
                                            double minLon,
                                            double maxLat,
                                            double maxLon,
                                            boolean accelerateGeometry,
                                            TarArchiveInputStream f) {
public static TimeZoneEngine initialize(Optional<Set<ZoneId>> timeZones,
                                            boolean accelerateGeometry,
                                            TarArchiveInputStream f) {

Both of those do have duplicate code on the featureStream Spliternator which could probably be refactored out. But everything else is a layer around those two.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was talking about adding the Optional<Set<ZoneId>> to the main initialize method, so that it looks like:

    public static TimeZoneEngine initialize(double minLat,
                                            double minLon,
                                            double maxLat,
                                            double maxLon,
                                            boolean accelerateGeometry,
                                            TarArchiveInputStream f,
                                            Optional<Set<ZoneId>> timeZones) {

However now that I look at this signature, it seems that the bounding box params (min-max lat-lon ones) don't make a lot of sense in the same method as the timeZones, so you're right, they should probably be 2 separate methods as you suggested in your previous comment. So if you can factor out the Spliterator and related code to avoid duplication between the two, it would be the best outcome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a user would either use a bounding box, or a set of ZoneId but never both at the same time. We're totally on the same page now.

It'll be couple days before I can get to the Spliterator change, I'm heading out of town.

* @param accelerateGeometry Increase query speed at the expense of memory utilization
* @return an initialized instance of {@link TimeZoneEngine}
*/
public static TimeZoneEngine initialize(List<ZoneId> timeZones, boolean accelerateGeometry) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment regarding the overloads and code duplication as above. If we can figure out a good default value for the timeZones, we won't need this method at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the subject of duplication, I'm going to put the number of time zones into a constant at the top of the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants